Multi-model deploy and test with wrapper by asm582 · Pull Request #1014 · llm-d/llm-d-workload-variant-autoscaler

asm582 · 2026-04-15T13:11:45Z

This PR introduces a new design for installing multi-model infra via a wrapper around the current infra install scripts. This avoids changing the existing e2e infrastructure that runs several other tests. It also contains a separate make command to run multi-model tests that are in sync with existing single-model benchmark tests. Gemini was used to help with coding.
This is mostly dormant code at this point that has not been connected to CI. The following commands can help run this PR in a namespace:

abhishekmalvankar@wecm-9-67-159-78 llm-d-workload-variant-autoscaler % make undeploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=asmalvan-test-3 LLMD_NS=asmalvan-test-3 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B" && \
make deploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=asmalvan-test-3 LLMD_NS=asmalvan-test-3 \
  NAMESPACE_SCOPED=true SKIP_BUILD=true \
  DECODE_REPLICAS=1 IMG_TAG=v0.6.0 LLM_D_RELEASE=v0.6.0 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B" && \
make test-multi-model-scaling \
  ENVIRONMENT=openshift \
  LLMD_NS=asmalvan-test-3 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

asm582 · 2026-04-15T13:12:00Z

/ok-to-test

github-actions · 2026-04-15T13:12:14Z

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

github-actions · 2026-04-15T13:12:17Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-04-15T14:31:42Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	39	11

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

lionelvillard · 2026-04-15T14:43:47Z

/ok-to-test

github-actions · 2026-04-15T14:43:59Z

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

github-actions · 2026-04-15T14:46:12Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-04-15T14:51:38Z

GPU Pre-flight Check ❌

Insufficient GPUs to run OpenShift E2E. Re-run with /retest (OpenShift E2E) when GPUs free up.

Resource	Total	Allocated	Available
GPUs	50	49	1

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

github-actions · 2026-04-15T15:46:49Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	39	11

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

github-actions · 2026-04-15T15:57:57Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	40	10

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

lionelvillard · 2026-04-15T16:42:54Z

/ok-to-test

github-actions · 2026-04-15T16:43:07Z

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

github-actions · 2026-04-15T16:43:13Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-04-15T16:46:08Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	39	11

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

kahilam

Ignore my previous review for now.

Addresses review feedback from #1014 to move away from bash deployment scripts for readability, type safety, and concurrent model deployment. Key improvements: - Models 2..N deploy concurrently via goroutines (bash was sequential) - Connectivity verification uses kubectl port-forward from the Go process, eliminating the in-cluster curl Job and its Docker Hub image (curlimages/curl:latest) - Kubernetes resources (Gateway, HTTPRoute) created via dynamic client instead of heredoc YAML - Proper error handling and structured logging The Go tool is invoked via `go run ./deploy/multimodel` from the same Makefile targets (deploy-multi-model-infra, undeploy-multi-model-infra). Made-with: Cursor

multi-model code with wrapper

d1debfb

asm582 changed the title ~~Multi-model code with wrapper~~ Multi-model deploy and test with wrapper Apr 15, 2026

asm582 requested review from kahilam and lionelvillard April 15, 2026 13:41

asm582 mentioned this pull request Apr 15, 2026

feat: infrastructure support for multi-model deployment and isolated gateway routing #1007

Closed

asm582 closed this Apr 15, 2026

asm582 reopened this Apr 15, 2026

fix lint issue

708730d

missed linter file

855e6c9

lionelvillard reviewed Apr 15, 2026

View reviewed changes

Comment thread deploy/install-multi-model.sh

kahilam reviewed Apr 15, 2026

View reviewed changes

lionelvillard approved these changes Apr 15, 2026

View reviewed changes

asm582 merged commit 036dc2a into main Apr 15, 2026
17 checks passed

asm582 deleted the multi-model-indp branch April 15, 2026 17:32

kahilam mentioned this pull request Apr 15, 2026

Convert multi-model deploy script from bash to Go #1015

Open

asm582 mentioned this pull request Apr 15, 2026

[Feature]: Multiple base model deployment llm-d-incubation/llm-d-modelservice#253

Open

Conversation

asm582 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asm582 commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

GPU Pre-flight Check ✅

Uh oh!

lionelvillard commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

GPU Pre-flight Check ❌

Uh oh!

github-actions bot commented Apr 15, 2026

GPU Pre-flight Check ✅

Uh oh!

github-actions bot commented Apr 15, 2026

GPU Pre-flight Check ✅

Uh oh!

lionelvillard commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

GPU Pre-flight Check ✅

Uh oh!

Uh oh!

kahilam left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

asm582 commented Apr 15, 2026 •

edited

Loading

kahilam left a comment •

edited

Loading